if (!"pacman" %in% rownames(installed.packages())) install.packages("pacman")
library(pacman)
pacman::p_load(haven, hrbrthemes, kableExtra, knitr, labelled, modelsummary, openxlsx, tidyverse)Connect with me on Open Science Framework | Contact me via LinkedIn
It might be necessary to use right-click -> open in a new browser window depending on your machine.
R analysis script presenting part of the solutions for exercise on the hearing test (see last exercise).
The purpose of this script is not solely to answer the exercise question. Moreover, studying these scripts should help students become familiar with some aspects of working in R.
If this exercise captures your interest, please take a look at our master study programs at Otto von Guericke University in Magdeburg (Germany), by clicking on the logo:
1 Loading packages
R is a context-sensitive language. Thus, ‘data’ will not be interpreted the same way as ‘Data’.
In R most functionality is provided by additional packages.
Most of the packages are well-documented; see: https://cran.r-project.org/
The code chunk below first evaluates if the package pacman (Rinker & Kurkiewicz, 2018) has already been installed on your machine. If yes, the corresponding package will be loaded. If not, R will install the package.
Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).
The second line then loads the package pacman.
The third line uses the function p_load() from the pacman package to install (if necessary) and loads all packages that we provide as arguments.
In all code chunks throughout this script, you can get additional help on each function used by clicking on its name (or by right-clicking and then opening it in a new browser tab). Alternatively, you can see which arguments a function accepts while coding by pressing ‘F1’ with the cursor positioned over the function’s name.
Here is the R session info, which gives you information on my machine, all loaded packages, and their version:
R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] openxlsx_4.2.8 quarto_1.4.4 ggpubr_0.6.0 lubridate_1.9.4
[5] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[9] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
[13] tidyverse_2.0.0 modelsummary_2.2.0 labelled_2.14.0 knitr_1.49
[17] kableExtra_1.4.0 hrbrthemes_0.8.7 haven_2.5.4 downloadthis_0.4.1
[21] pacman_0.5.1
loaded via a namespace (and not attached):
[1] gtable_0.3.6 xfun_0.49 htmlwidgets_1.6.4
[4] processx_3.8.5 insight_1.0.1 rstatix_0.7.2
[7] tzdb_0.4.0 ps_1.8.1 vctrs_0.6.5
[10] tools_4.4.3 generics_0.1.3 pkgconfig_2.0.3
[13] data.table_1.16.4 lifecycle_1.0.4 compiler_4.4.3
[16] munsell_0.5.1 carData_3.0-5 fontquiver_0.2.1
[19] fontLiberation_0.1.0 htmltools_0.5.8.1 yaml_2.3.10
[22] Formula_1.2-5 Rttf2pt1_1.3.12 later_1.4.1
[25] pillar_1.10.1 car_3.1-3 extrafontdb_1.0
[28] abind_1.4-8 fontBitstreamVera_0.1.1 zip_2.3.1
[31] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4
[34] extrafont_0.19 fastmap_1.2.0 grid_4.4.3
[37] colorspace_2.1-1 cli_3.6.2 magrittr_2.0.3
[40] broom_1.0.7 withr_3.0.2 gdtools_0.4.1
[43] scales_1.3.0 backports_1.5.0 timechange_0.3.0
[46] rmarkdown_2.29 ggsignif_0.6.4 hms_1.1.3
[49] evaluate_1.0.3 viridisLite_0.4.2 rlang_1.1.4
[52] Rcpp_1.0.14 glue_1.8.0 xml2_1.3.6
[55] svglite_2.1.3 rstudioapi_0.17.1 jsonlite_1.8.9
[58] R6_2.5.1 tables_0.9.31 systemfonts_1.2.1
2 Importing the data
For your master’s thesis, it might be a good idea to replicate the IAT findings of Study 1 (D-score > 0, t-test). Based on a power analysis (Assumptions: 𝛼=5%, Power=0.8), how many subjects should you recruit for this study?
The data set is stored as an Microsoft Excel file (.xlsx) within the following directory: https://pub.ww.ovgu.de/lichters/smpi/data/hearing_test.xlsx
The data set’s name is hearing_test.xlsx.
We download the file to our current R project into an object named d. for this purpose, we draw on the function read.xlsx() from the package openxlsx (Schauberger & Walker, 2023).
Below, we tell the function to read the first sheet of the Excel file, check the names of the columns, and separate the non-standard names by an underscore.
d <- read.xlsx("https://pub.ww.ovgu.de/lichters/smpi/data/hearing_test.xlsx",
sheet = 1,
check.names = TRUE,
sep.names = "_") %>% as_tibble()If anything does not work out as expected on your machine, you can download the data by clicking on the following button.
Download the Excel file manually
3 Data inspection
Let’s take a look at the first few rows of the data set to get an idea of its structure.
d# A tibble: 27 × 4
Gender Age Frequency Year
<chr> <dbl> <dbl> <dbl>
1 f 21 16000 2022
2 f 23 16400 2022
3 m 25 15850 2022
4 m 26 16302 2022
5 f 21 16000 2022
6 f 22 18000 2022
7 f 28 18000 2022
8 f 26 13711 2022
9 f 25 14287 2022
10 f 24 16222 2022
# ℹ 17 more rows
4 Using linear regression to explain the highest perceivable frequency by age
Call:
lm(formula = d$Frequency ~ d$Age)
Residuals:
Min 1Q Median 3Q Max
-1867.49 -1003.37 -13.47 583.04 2991.46
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22987.83 1315.29 17.477 1.58e-15 ***
d$Age -284.97 50.74 -5.617 7.63e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1204 on 25 degrees of freedom
Multiple R-squared: 0.5579, Adjusted R-squared: 0.5402
F-statistic: 31.55 on 1 and 25 DF, p-value: 7.629e-06
5 Is capability to hear high frequencies a function of the interaction between age and gender?
We can first visualize the relationship between the highest perceivable frequency and age for both sexes in isolation.
ggplot(d, aes(x = Age, y = Frequency, color = Gender)) +
geom_smooth(method = "lm", fill = "#AEC6D2", se = TRUE) +
geom_point(alpha = 0.8, shape = 21, size = 2, stroke = 1) +
theme_minimal() +
annotate(geom = "text", x = 30, y = 17000, label = paste(
"Model across both groups : ", "\n\n",
round(model_age$coefficients[1], digits = 2),
" - ",
abs(round(model_age$coefficients[2], digits = 2)),
"*years", "\n\n", "R²(adjusted) = ",
round(summary(model_age)$adj.r.squared, digits = 2)
), hjust = "left") +
theme(
plot.background = element_rect(fill = "#F2F6F8"),
panel.background = element_rect(fill = "grey98"),
panel.border = element_rect(colour = "black", fill = NA, size = 0.7),
text = element_text(size = 12),
axis.title = element_text(size = 13)
)Next, let’s try a regression that includes the interaction and main effects of age and gender. The question is whether the slopes for both sexes are different enough to suggest that they likely do not come from the same data-generating process.
model_moderation_regression <- lm(d$Frequency ~ d$Age*d$Gender)
summary(model_moderation_regression)
Call:
lm(formula = d$Frequency ~ d$Age * d$Gender)
Residuals:
Min 1Q Median 3Q Max
-2059.3 -495.6 199.6 553.2 2459.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18761.7 3124.9 6.004 4.02e-06 ***
d$Age -115.1 129.4 -0.889 0.3831
d$Genderm 6422.2 3593.1 1.787 0.0871 .
d$Age:d$Genderm -237.4 143.0 -1.660 0.1105
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1164 on 23 degrees of freedom
Multiple R-squared: 0.6194, Adjusted R-squared: 0.5698
F-statistic: 12.48 on 3 and 23 DF, p-value: 4.763e-05
Alternatively, more insights can be obtained by using a step-wise regression approach. We use the modelsummary package (Arel-Bundock, 2022) to focus on the model differences, which provides a function with the same name.
Below, we establish a regression model with only participants’ gender and age as predictors. We then, apply the modelsummary function to compare all three models.
model_age_gender <- lm(d$Frequency ~ d$Age + d$Gender)
models <- list(
"Model 1" = model_age,
"Model 1" = model_age_gender,
"Model 3" = model_moderation_regression
)
modelsummary(models,
stars = FALSE,
statistic = c("std.error", "p.value"),
shape = term ~ model + statistic,
fmt = 3,
coef_rename = TRUE,
align = "lrrrrrrrrr",
gof_omit = "AIC|RMSE|Log",
notes = c("Notes: Dependent variable is highest perceivable frequency in Hertz, independent varibles are: age in years, gender (female vs. male), and their interaction. The upper table area shows unstandardized model coefficients [Est.], non-robust standard errors [S.E.] and p-values [p].")
)| Model 1 | Model 1 | Model 3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Est. | S.E. | p | Est. | S.E. | p | Est. | S.E. | p | |
| Notes: Dependent variable is highest perceivable frequency in Hertz, independent varibles are: age in years, gender (female vs. male), and their interaction. The upper table area shows unstandardized model coefficients [Est.], non-robust standard errors [S.E.] and p-values [p]. | |||||||||
| (Intercept) | 22987.835 | 1315.289 | <0.001 | 23438.299 | 1401.129 | <0.001 | 18761.693 | 3124.925 | <0.001 |
| d$Age | -284.975 | 50.737 | <0.001 | -309.463 | 57.034 | <0.001 | -115.054 | 129.403 | 0.383 |
| d$Genderm | 523.309 | 552.349 | 0.353 | 6422.151 | 3593.084 | 0.087 | |||
| d$Age × d$Genderm | -237.378 | 142.990 | 0.110 | ||||||
| Num.Obs. | 27 | 27 | 27 | ||||||
| R2 | 0.558 | 0.574 | 0.619 | ||||||
| R2 Adj. | 0.540 | 0.538 | 0.570 | ||||||
| BIC | 467.5 | 469.8 | 470.0 | ||||||
| F | 31.547 | 16.158 | 12.479 | ||||||
Finally, we use a more prominent method to assess the moderating role of gender in the relationship between age and the highest perceivable frequency. We use the PROCESS macro (Hayes, 2022) to estimate the interaction effect, drawing on bootstrap samples.
Below, we load the macro from the internet.
source("https://pub.ww.ovgu.de/lichters/smpi/data/process.R")
********************* PROCESS for R Version 4.3.1 *********************
Written by Andrew F. Hayes, Ph.D. www.afhayes.com
Documentation available in Hayes (2022). www.guilford.com/p/hayes3
***********************************************************************
PROCESS is now ready for use.
Copyright 2020-2023 by Andrew F. Hayes ALL RIGHTS RESERVED
Workshop schedule at http://haskayne.ucalgary.ca/CCRAM
If anything does not work out as expected on your machine, you can download the data by clicking on the following button.
If you have downloaded manually, you need to execute source(process.R).
Below, we need to explicitly convert the variable gender into a factor and then into a labelled variable. This is because the PROCESS macro requires the moderator to be a numeric variable.
Subsequently, we run the PROCESS macro. The function requires the data set (data), the dependent variable (y), the independent variable (w), the moderator (w), the model number (which is 1), the centering method, the number of bootstrap samples (boot), and the confidence level (conf). We further set the seed to ensure reproducibility. In addition, we set the progress bar to FALSE.
d$Gender <- d$Gender %>% to_factor() %>% to_labelled()
process(
data = d,
y = "Frequency",
x = "Age",
w = "Gender",
model = 1,
center = 2, jn = 1,
moments = 1, modelbt = 1,
boot = 10000, seed = 54545, progress = F,
conf = 0.95
)
********************* PROCESS for R Version 4.3.1 *********************
Written by Andrew F. Hayes, Ph.D. www.afhayes.com
Documentation available in Hayes (2022). www.guilford.com/p/hayes3
***********************************************************************
Model : 1
Y : Frequency
X : Age
W : Gender
Sample size: 27
Custom seed: 54545
***********************************************************************
Outcome Variable: Frequency
Model Summary:
R R-sq MSE F df1 df2 p
0.7870 0.6194 1355418.7784 12.4787 3.0000 23.0000 0.0000
Model:
coeff se t p LLCI ULCI
constant 15461.0587 791.7364 19.5280 0.0000 13823.1577 17098.9597
Age 122.3236 265.8595 0.4601 0.6498 -427.6721 672.3193
Gender 364.6212 541.6891 0.6731 0.5076 -755.9955 1485.2380
Int_1 -237.3778 142.9900 -1.6601 0.1105 -533.1877 58.4320
Product terms key:
Int_1 : Age x Gender
Test(s) of highest order unconditional interaction(s):
R2-chng F df1 df2 p
X*W 0.0456 2.7559 1.0000 23.0000 0.1105
***********************************************************************
Bootstrapping in progress. Please wait.
********** BOOTSTRAP RESULTS FOR REGRESSION MODEL PARAMETERS **********
Outcome variable: Frequency
Coeff BootMean BootSE BootLLCI BootULCI
constant 15461.0587 15348.5690 1003.2394 13399.9685 17287.5010
Age 122.3236 85.3801 351.3782 -645.5762 705.5247
Gender 364.6212 432.9738 668.9488 -856.6951 1640.8268
Int_1 -237.3778 -221.5327 212.1079 -584.1524 222.7070
******************** ANALYSIS NOTES AND ERRORS ************************
Level of confidence for all confidence intervals in output: 95
Number of bootstraps for percentile bootstrap confidence intervals: 10000
NOTE: Confidence level restricted to between 50 and 99.9999%. 95% confidence is provided in output.
NOTE: The following variables were mean centered prior to analysis:
Age
NOTE: Due to estimation problems, some bootstrap samples had to be replaced.
The number of times this happened was: 7
References
Reuse
Copyright
Citation
@misc{lichters2025,
author = {Lichters, Marcel},
title = {Sensory {Marketing} \& {Product} {Innovation:} {Exercise} -
{Hearing} Test},
date = {2025-03-21},
url = {https://pub.ww.ovgu.de/lichters/smpi/exercise/Exercise_Hearing_test.html},
langid = {en}
}